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The recently developed digital signal processor is a device used for 
implementing low- to medium- complexity speech coders. It is cur- 
rently being used in implementing adaptive differential pulse- code 
modulation (adpcm) coding, two- band sub-band coding, and four- 
band sub-band coding. This study was performed to determine opti- 
mal parameter values for the two sub- band coders in preparation for 
their implementation on the digital signal processor and to determine 
their performance relative to adpcm. (The actual implementation of 
the adpcm and two- band sub- band algorithms are discussed in other 
papers in Part 2 of this issue of the Bell System Technical Journal.,) 
Performance was judged on the basis of segmental signal-to-noise 
ratio and a forced- choice, subjective comparison test of the coders. 
All three coders were simulated at bit rates of 16, 20, 24, 28, and 32 
kb/s. The simulations were performed on a laboratory computer. 

I. INTRODUCTION 

The recently developed dsp is a device for implementing low-to 
medium-complexity speech coders. Three coders are currently being 
implemented. The simplest coder is adaptive differential pulse-code 
modulation (adpcm) and is discussed in Ref. 1. The other two are in 
the sub-band coder (sbc) family. Of these, the simpler one is two-band 
sub-band coding (2b-sbc), featuring quadrature mirror filtering and 
two equal bands. It is discussed in Ref. 2. The other coder — the most 
complicated— is four-band sub-band coding (4b-sbc), featuring four 
equal bands. Its implementation is still in progress. 

This report discusses the initial design parameters for the latter two 
coders and the relative performance of all three. Segmental signal-to- 
noise ratio (snr) measurements were made on all three coders via 
computer simulation at five different bit rates, 16, 20, 24, 28, and 32 
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kb/s. In addition, 12 subjects ranked the coders in a comparison test. 
The simulations reported here were carried out on a laboratory com- 
puter as preparation for the implementation of the sbc coders on the 

DSP. 

Section II reviews the design of adpcm and discusses the design of 
the two sub-band coders. Section III discusses the results of the 
subjective testing experiment, and Section IV gives the conclusions of 
this study. 

II. DESIGN OF THE CODERS 
2.1 Design of ADPCM 

The adpcm design simulated here is based on the design of Cum- 
miskey et al. 3 A block diagram of the adpcm coder described below is 
shown in Fig. 1. The most significant change from the design in Ref. 
3, is that only two multiplier values are used in changing the step-size, 
regardless of the bit rate. This is based on the adpcm implemented by 
Johnston and Goodman. 4 This version of adpcm has already been 
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Fig. 1 — Adaptive differential pulse-code-modulation coder used in simulation. 
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implemented on the dsp and is described in Ref. 1. The adpcm design 
was also used for quantizing the sub-band signals in the other coders. 
To simulate 20 kb/s and 28 kb/s adpcm, alternating quantizers were 
used. For 20 kb/s, the two quantizers used are for 2 and 3 bits. Since 
the step-size is adapted based only on the most significant magnitude 
bit, the same step-size adaptation algorithm is used for all samples. 
The ratio of 2- to 3-bit quantizer step-size is held constant. This 
requires one additional multiplication to convert from the 2- to 3-bit 
step-size. 

2.2 Two-band SBC 

All sub-band coders are made from a few fundamental building 
blocks. The first is linear filtering to divide the signal into two or more 
sub-bands. These sub-bands can then be decimated to a lower sampling 
rate than the original signal. Some form of quantization must be used 
to encode and quantize each band. Interpolation and additional linear 
filtering is used to bring each band back to the original sampling rate 
and to its original space in the frequency spectrum. At this point, they 
can be added together to produce an output signal. 

The quadrature mirror filtering technique is fairly well known for its 
use with sub-band coders. Each pair of quadrature mirror filters (qmfs) 
produces two sub-bands of equal width in frequency. Johnson has 
compiled a collection of different length qmfs. 5 The possible quantizers 
which can be used are adaptive delta modulation (adm), adpcm, and 
adaptive pulse-code modulation (apcm). Each of these techniques is 
fairly well known and understood. Likewise interpolation and deci- 
mation are also well understood. So what remains is the task of 
combining these building blocks in such a way as to fit on the dsp and, 
also, give the best possible performance. One of the tasks of this study 
was to choose good candidates for implementation. 

The 2b-sbc design is based on the 2b-sbc commentary grade coder 
of Johnston and Crochiere. 6 That coder was developed with the object 
of maintaining a high-quality am radio signal. Its parameters were 
tuned to music rather than to speech. This section describes param- 
eters for a speech bandwidth version. There are five possible bit rates 
envisioned. For a more detailed discussion of sub-band coding in 
general and the exact implementation of this coder refer to Ref. 2. 

Figure 2 is a block diagram of the 2b-sbc. The input speech has been 
band-limited from 200 to 3200 Hz by a sharp bandpass filter and 
sampled at 8000 Hz. A 32-tap qmf designed by Johnston is used for 
separating the digitized speech into the two sub-bands. 5 After 2-to-l 
decimation on both bands, we found average correlations for speech of 
0.7 and -0.45 for the low and high bands, respectively. 

The two bands are then coded using either adpcm or adm. The adm 
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Fig. 2 — Two-band sub-band coder used in simulation. 

is used only for the higher band at low bit rates. It is based on the adm 
of Jayant. 7 The prediction coefficients used for these coders were the 
correlation values mentioned above. Since the high band has a negative 
correlation, it was frequency inverted before quantization by the adm, 
because this adm requires a positive correlation for its adaptation 
mechanism to work properly. Since frequency inversion just means 
changing the sign of every other sample, this is a very minor operation. 
The next step was to determine optimal bit allocations for the low 
and high bands. After experimenting with different bit allocations and 
evaluating them on the basis of segmental snr measurements and 
informal listening, the following bit allocations were adopted for the 
five bit rates: 

16 kb/s low: 3 bits high: 1 bit (adm) 
20 kb/s low: 4 bits high: 1 bit (adm) 
24 kb/s low: 5 bits high: 1 bit (adm) 
28 kb/s low: 5 bits high: 2 bits 
32 kb/s low: 5 bits high: 3 bits. 

Some alternative designs were very close. For instance, a (4, 2) allo- 
cation for 24 kb/s is almost as good as (5, 1) for the speech it was 
tested on. Perhaps if the speech were less sharply bandpass-filtered 
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and if there were more high-frequency content (such as in telephone 
speech) the better allocation would be (4, 2) for 24 kb/s. 

2.3 Four-band SBC design 

The 4b-sbc design described here is new, although it is a logical 
extension of the 2b-sbc mentioned above. It starts with the same two 
sub-bands as the two-band design. Both of these bands are then 
divided into two new bands, yielding a total of four equally spaced 
bands. The filter used for the additional division in each band is the 
16-tap QMF of Johnston designated C in Ref. 5. Figure 3 shows a block 
diagram for this coder. 
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Fig. 3 — Four-band sub-band coder used in simulation. 
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Once more the bands are quantized using adpcm or adm. Our 
measurements of average correlation for speech data showed correla- 
tions of 0.4, 0, 0, and 0.8 for the four bands going from lowest to highest 
in frequency. The fourth band (3000 to 4000 Hz) has actually been 
bandpass-filtered to cut off at 3200 Hz. As a result, it contains little 
power and can be ignored for low-bit rate coders. The correlations of 
the two middle bands are zero, reflecting that the long-term average of 
the speech spectrum from 1000 to 3000 Hz is flat. If a prediction 
coefficient of zero is used with adpcm, the result is apcm. Thus, the 
two middle bands are APCM-encoded. The largest amount of power is 
in the first band; therefore, it receives the most bits. 

The bit allocations found to be the best by the same segmental snr 
measurements and casual listening process were as follows: 

16 kb/s 4,2,2,0 (bands 1 to 4) 
20 kb/s 5,2,2,1 (adm on band 4) 
24 kb/s 5,4,2,1 
28 kb/s 7,4,2,1 
32 kb/s 7,4,3,2. 

The greatest amount of error occurs in the lowest band. Even at the 
high rates (28 and 32 kb/s) this error is still perceptible as a low 
rumbling noise. However, it was found that a high-pass filter with a 
cutoff of 200 Hz eliminated this problem. The filter used was a 121-tap 
fir filter. Table I gives the coefficients, and Fig. 4 shows the frequency 
response. A much smaller iir filter could also be used to do the same 
job. 2 Note that the above bit assignments were made without using 
the fir filter. With a high-pass filter, fewer bits could be allocated to 
the lowest band and more to bands two and three. 

2.4 Relative complexity of designs 

The adpcm designs for 16, 24, and 32 kb/s have already been 
implemented on the dsp. 1 The combined encoder and decoder algo- 
rithms use 48 percent of the dsp real-time capability for a sampling 



Table I — Coefficients for symmetric fir high-pass filter 
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Fig. 4 — Frequency response of high-pass filter for 4b-sbc. 

rate of 8 kHz. An even lesser percentage of ram and ROM memory is 
used. The 2b-sbc based on the design parameters reported here has 
also been implemented on the dsp chip. 2 It uses 98 percent of the real- 
time capability and 78 percent of the ram memory. It includes an iir 
bandpass filter for the input. The 4b-sbc algorithm is planned for 
implementation in the near future. Since all of the major portions of 
the 4b-sbc have been programmed already for the 2b-sbc implemen- 
tation, it is possible to project how much of the dsp will be used. Both 
the transmitter and the receiver will require a dsp and each will use 
about the same fractions of real-time capability and ram as the 
complete 2b-sbc algorithm. Therefore, so we might classify the three 
coders as having complexities of 0.5, 1, and 2, respectively. 

III. RELATIVE PERFORMANCE OF THE CODERS 

Since the sbc designs are more complex, a demonstration of their 
improved performance over adpcm was needed to justify their imple- 
mentation on the dsp. To demonstrate their relative performance all 
three coders were simulated on a laboratory computer. Each processed 
speech from a stored file. The results were evaluated by both an 
objective and subjective measure. The objective measure was segmen- 
tal snr, while the subjective measure was a forced-choice, subjective 
(A-B) comparison test in which all possible coders were compared. 

Six phonetically balanced sentences were used for evaluating the 
coders. Three were spoken by male speakers and three by females. 
They were recorded using a linear microphone. They were band- 
limited from 200 to 3200 Hz and sampled at 8000 Hz using a 15-bit 
linear quantizer. 

3. 1 Segmental signal-to-noise ratio results 

In computing segmental snr measurements, blocks of speech of 32 
ms were used. The adpcm coder was compared with the original input 
speech. The sbc coders were compared with reassembled speech which 
had been processed by the appropriate qmf filtering, but with no 
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quantization. These slightly modified speech signals cannot be distin- 
guished from the original in casual listening. Without them it would be 
difficult to make a fair comparison of the three coders on the basis of 
SNR. The measurements on 4b-sbc were made before the 121-tap fir 
high-pass filtering. 

The results of these measurements are summarized in Fig. 5. They 
show that the more complex sbc coders have a definite advantage over 
adpcm at the lower-bit rates. Interestingly at 32 kb/s, adpcm beats 
both of the more complex coders. The 4b-sbc maintains a fairly 
constant 2-dB advantage over 2b-sbc. In terms of bit rate this trans- 
lates to 4 kb/s. At the low rates, the 4b-sbc has about a 6-kb/s 
advantage over adpcm. 

3.2 Subjective testing of the three coders 

An A-B comparison test was performed to rank the three coders. 
Each coder at each rate was compared twice against every other coder 
at every rate, as well as against the original. In the two comparisons of 
the two coders, each one was played in first position once. There were 
12 participants in the test and altogether there were 240 comparisons. 
The test was broken down into two parts, one with 110 comparisons, 
the other with 130. The participants listened over headphones in a 
soundproof booth. The participants were also broken down into two 
groups of six. If one group listened to a particular A-B comparison 
with a female speaker the other group heard a sentence with a male 
speaker and vice versa. Thus, we attempted to make a totally balanced 
and unbiased test. 
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Fig. 5 — Segmental snr measurements for three coders. 
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Table II gives the individual coder versus coder comparisons. In 
addition, an overall preference ranking was computed based on the 
total number of votes received by each coder. In all, a total of 360 
votes could be received by any coder. Figure 6 shows the percentage 
of the 360 possible votes received for each coder. This result is in good 
agreement with the results of Fig. 4. For example, both sub-band 
coders show adantage over adpcm at the low rates and adpcm catches 
up or passes them at the high rates. 

Some of the more significant results are the following: 
(i) The 4b-sbc has an 8-kb/s perceptual advantage over adpcm at 
the low rates. The 24-kb/s adpcm has been used for voice storage and 
playback systems. 8 This result indicates that 16-kb/s 4b-sbc could be 
substituted at a 33 percent savings in storage or, equivalently, a 50 
percent increase in message storage capability. Moreover, at 20-kb/s, 
2b-sbc has a 4-kb/s perceptual advantage over adpcm. 

{ii) Although 4b-sbc lost to adpcm at 32 kb/s in snr measure- 
ments, it beat adpcm in the subjective tests. In addition, in direct 
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comparisons with the original, 32-kb/s 4b-sbc received 37.5 percent of 
the votes, an almost equipreferential rating. This indicates it is high 
quality. Since 32-kb/s adpcm is often described as toll quality, then 
32-kb/s 4b-sbc also deserves this ranking. 

(iu) The 2b-sbc seems to provide a good alternative to adpcm at 
the low bit rates for a modest increase in complexity. 

IV. CONCLUSIONS 

We have presented measurement data and simulation results for use 
in implementing two sub-band coders on the dsp. This data has already 
been used to design and implement the 2b-sbc and is being used for a 
planned 4b-sbc implementation. Simulations of these candidate coders 
were made on a laboratory computer. The results of these simulations 
indicate 2b-sbc and 4b-sbc have important advantages over adpcm 
at low bit rates. This advantage is as much as 8 kb/s for 4b-sbc and 
4 kb/s for 2b-sbc. The 16-kb/s 4b-sbc could be substituted for 24-kb/ 
s adpcm in a voice storage and playback system. In addition, 4b-sbc 
is rated as better quality at 32 kb/s than at 32-kb/s adpcm. 

Since the complexity of these coders is within an order of magnitude 
of adpcm they should be considered as viable alternatives. 
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